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ASYMPTOTIC EQUIVALENCE OF EMPIRICAL 
LIKELIHOOD AND BAYESIAN MAP 

By Marian Grendar^ and George Judge 

Bel University and University of California, Berkeley 

In this paper we are interested in empirical likelihood (EL) as a 
method of estimation, and we address the following two problems: 
(1) selecting among various empirical discrepancies in an EL frame- 
work and (2) demonstrating that EL has a well-defined probabilistic 
interpretation that would justify its use in a Bayesian context. Using 
the large deviations approach, a Bayesian law of large numbers is 
developed that implies that EL and the Bayesian maximum a pos- 
teriori probability (MAP) estimators are consistent under misspeci- 
fication and that EL can be viewed as an asymptotic form of MAP. 
Estimators based on other empirical discrepancies are, in general, 
inconsistent under misspecification. 

1. Introduction. Owen's empirical likelihood (EL) theorem ([30] and 
[31]) provides under traditional assumptions a basis for forming confidence 
regions for multivariate means and parameters in estimating equations. The 
basic EL idea is to proceed as if the sample Xi,X2, ■ ■ ■ , Xn, drawn from an 
unknown distribution r(x) is i.i.d. and can be modeled as a multinomial 
distribution based on the observations. Inference for the J unknown param- 
eters is based on K estimating equations and a nonparametric likelihood 
ratio statistic that asymptotically has a chi-square distribution. As a result, 
EL is an attractive orthodox semiparametric method of estimation and in- 
ference whose scope has been extended in several productive directions (e.g., 
see [17] and [41]). 

Building on Owen's EL insight, this paper is concerned with using em- 
pirical likelihood as a method of estimation (cf. [3, 19, 27, 31] and [32], 
among others). Through a Bayesian law of large numbers (BLLN, Theorem 
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2.2, Section 2.2, see also Section 2.3) we establish in a Bayesian setting an 
asymptotic connection between the EL and maximum a posteriori proba- 
bility (MAP) estimators. The BLLN implies that under certain conditions 
the posterior measure asymptotically weakly concentrates on the MAP, or 
equivalently EL, estimators, even when the model is not correctly specified. 
The BLLN result is established via the large deviations (LD) approach (cf. 
Section 2.1). 

Typically, as noted above, EL estimators are formed in an Estimating 
Equations (EE) framework [14]. In this way EL combines the flexibility 
of the nonparametric approach with the advantages of a finite parametriza- 
tion. To be more specific, let us assume that a researcher is willing to specify 
only some of features of the data-sampling distribution r(x; 9). These model 
features, that is, the model $(0), can be characterized by estimating equa- 
tions (cf. [31], Chapter 3.5 and [27], Chapter 11): $(9) = IJe $(6*), where 
<^{e) ^ {q{x; 9) : / qix; 9)uj{x; 9) = 0, 1 < j < J; / q{x- ■) = 1, q{x; •) > 0}, G 
G C M^. The J number of estimating functions u{-) need not be equal to 
the number K of parameters. Given the sample drawn from r{x;9), the EL 
estimator ^el of 9 is obtained as a parametric component of 

n 

(1.1) gEL(2;i;6'EL) = arg sup^ sup^logq{xi;9), 

q{xf,e)ed>{e) j=i 

where %{9) = {q{xi; 9) ■.J:7=lQi^^■, d)uj ix^■, 9) = 0,1 < j < J;j:7=iQixi;9) = 
l,q{xi; •) > 0} is the empirical form of ^{9). 

The asymptotic performance of ^el has been studied by Qin and Law- 
less [32]. The same asymptotic properties are exhibited by the exponen- 
tial tilting estimator, which results when in (1.1) the log-likelihood is re- 
placed by the negative Kullback-Leibler discrepancy (empirical entropy) 
— X)r=i9(^i;^)log(9(^«;^)/M(^j)) of Q with respect to the uniform proba- 
bility mass function /i (•) (cf. [18, 23] and [27]). Also, in [3], the Euclidean 
distance J2i'=i{li^i'^^) ~ f^i^i))^ is employed and in recent years the Cressie- 
Read [4] family of discrepancy measures has been used in the EL framework. 
When the model is correctly specified, the resulting estimators are consis- 
tent regardless of the discrepancy measure used. Since in practice the model 
is rarely specified correctly, it is of interest to study consistency of various 
EL estimators under statistical model misspecification. 

Recognizing the importance of the statistical model in the estimation pro- 
cess, we study consistency under misspecification in the Bayesian setting. 
By means of the LD approach we obtain the Bayesian law of large num- 
bers (Theorem 2.2). The BLLN together with Lemmas 2.1 and 2.2 imply 
that, in the Bayesian setting, MAP and EL estimators are consistent under 
misspecification. In the Bayesian setting, Euclidean and other nonlikelihood 
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members of the Cressie~Read family may, in general, be inconsistent. The 
same holds for the posterior mean (cf. Section 2.3, Example 2.1). 

The BLLN also sheds a new light on the problem of extending of EL into 
a Bayesian method (cf. [26, 31, 33] and [35]). In [26] Lazar listed possible 
ways of turning EL into a Bayesian method and studied one of the possibil- 
ities within the framework of Monahan and Boos [28]. Schennach [35] pro- 
posed a specific prior over a set of sampling distributions to get a Bayesian 
procedure that admits an operational form similar to EL. In [33] a differ- 
ent prior over the set of probability measures is considered and a group of 
EL-like methods is obtained. The BLLN together with Lemmas 2.1 and 2.2 
imply that if certain requirements are put on an infinite-dimensional prior, 
the Bayesian MAP method [cf. equation (2.2)] leads asymptotically to the 
same point estimator(s) as empirical likelihood. Informally put, this means 
that EL, as a method of estimation, can be viewed as an asymptotic instance 
of MAP. Extension of the connection between EL and MAP into the field of 
inference remains an open problem; compare, for instance, Freedman's [10], 
where it is shown that the distributional asymptotic properties of maximum 
nonparametric likelihood (a special, nonpar ametric case of EL) and MAP 
estimators might be different, since the Bernstein-von Mises theorem does 
not apply, even for a simple infinite dimensional models. 

1.1. Organization of the paper. In Section 2.1 the LD approach to Bayesian 
consistency is informally described. In Section 2.2 a basic framework is estab- 
lished, Bayesian nonparametric consistency is formally defined, L-divergence 
is introduced and the BLLN theorem is proved for the i.i.d. case. In Section 
2.3 the BLLN for the semiparametric model is discussed and it is demon- 
strated that the L-projection, singled out by the BLLN, is an asymptotic 
form of the EL and MAP estimators. In order to further explore consistency 
under misspecification and expand the scope of the related asymptotic con- 
nection between MNPL /EL and MAP, we prove the BLLN also for the mul- 
ticolor Polya sampling process (Section 2.4) where using the BLLN suggests 
two possible variants of MNPL. In Section 2.5, the BLLN is proved for right 
censored data, showing that the Kaplan-Meier estimator is an asymptotic 
form of Bayesian MAP. 

2. Bayesian LLN's, MNPL, EL and MAP. In general, a feasible set $(9) 
of nonparametric sampling distributions which are indexed by a parameter 6 
can be formed in a way different from the conventional EE described above. 
The purely nonparametric $ is contained in <I>(0) as a special case. 

2.1. Large deviations approach to Bayesian consistency under misspecifi- 
cation. In a Bayesian framework, a prior distribution 11 over the set <I>(B) 
is assumed, and it induces a prior distribution Tl{9) over Q. Assuming the 
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Bayesian framework, it is of interest to know the sampling distribution(s) on 
which the posterior measure concentrates, as n, the sample size gets large. 
The importance of the frequentist concept of consistency for Bayesian statis- 
tics can be justified both from subjectivist and objectivist Bayesian positions 
(cf. [40] or [13], Chapter 4). A formal definition of Bayesian consistency is 
given in Section 2.2. 

We use a large deviations (LD) (cf. Ben-Tal, Brown and Smith [1] and 
[2] and Ganesh and O'Connell [11]) approach to Bayesian nonparametric 
consistency. The approach results in a Bayesian-Sanov Theorem (BST) and 
its corollary the Bayesian law of large numbers (BLLN) that establishes the 
consistency. LD theory is a subfield of probability theory where, informally, 
the typical concern is about the asymptotic behavior, on a logarithmic scale, 
of the probability of a given event. The BST identifies the rate function gov- 
erning exponential decay of the posterior measure, and this in turn identifies 
the sampling distributions on which the posterior concentrates, as those dis- 
tributions that minimize the rate function. Currently used approaches to 
Bayesian nonparametric consistency (cf. [38]) do not recognize this concen- 
tration of the posterior measure as a solution of the optimization problem. 

The Bayesian law of large numbers may be, informally, stated as follows: 
if the prior over a set $ of sampling distributions, which might not include 
the "true" distribution with probability density function r, satisfies certain 
conditions, then the posterior asymptotically concentrates (a.s. r°°) on weak 
neighborhoods of the L-projections of r on $. L-projection g of r on $ 
is g = arginfg(=$ L(g ]] r), where L{q\\r) is the L-divergence of probability 
density function q with regard to r. In the case of i.i.d. sampling, L{q \ \r) = 
-Jrlogq. 

Finally, let us note that the BST is Bayesian counterpart of a Sanov 
theorem for empirical measures (cf. [34] and [5], Sections III and VII and 
references cited therein). The latter, as well as its corollary, the conditional 
law of large numbers, are basic results of large deviations (LD) theory (cf. 
[5] and [7]). The LD theorems for empirical measures have a bearing for the 
relative entropy maximization method. Kitamura and Stutzer [22] noted that 
the LD argument can be used also in the semiparametric EE setting, where 
it provides an underpinning to exponential tilting (see [18] and [22]), also 
known as the maximum entropy empirical likelihood [27] method. In fact, the 
work of Kitamura and Stutzer [23] served as a starting point for our attempt 
to provide a similar underpinning to the MNPL and EL methods (see also 
[16]). It turned out that this is only possible in a Bayesian framework. 

2.2. BLLN for i.i.d. sampling. Let V be the set of all probability mea- 
sures on (M, B), which are dominated by the Lebesgue measure. Let Xi,X2, . . . 
be i.i.d. random variables that take values in (M,;i3), with probability density 
function (PDF) r where probability densities are denoted by lower case. V 
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is endowed with weak topology. Let $ C 7^. It is not assumed that r, the 
true sampUng distribution, is necessarily in <1>. Let cr{V) be a Borel cj-field 
on V. A positive prior H is put on {V,a{V)) that is strictly positive over 
The prior combines with data = Xi,X2, ■ ■ ■ ,Xn to define the posterior 
distribution 

j^e";n(<?)n(ci^) 

where Uq) = "ELi log(/(x,), Q C 

Let d be a metric on V. The sequence {n„(-|X"),n > 1} is said to be 
d-consistent at r, if there exists a C with r(r2o) = 1 such that for 
uj G Qq, for every neighborhood U of r, n„(C/|X") ^ 1 as n goes to infinity. 
If a posterior is (i-consistent for any r G then it is said to be d-consistent. If 
the consistency holds for the Hellinger distance, then the posterior is strongly 
consistent. If convergence holds in weak topology, the posterior is said to be 
weakly consistent. In [39] a decision-theoretic argument is proposed in favor 
of weak consistency. Surveys of Bayesian nonparametric consistency can be 
found in [12, 13] and [40]. 

To the best of our knowledge, Ben-Tal, Brown and Smith [1] were the 
first to use an LD approach to Bayesian nonparametric consistency. The 
authors showed consistency for X , taking values from a finite set X and 
under a possibly misspecified model. Recently, Ganesh and O'Connell [11] 
independently established the first formal BST, for finite set X and a well- 
specified model. Here we develop the BST and the BLLN for = M and a 
possibly misspecified model. Using techniques other than LD, consistency 
in the Hellinger distance and under misspecification was studied by Kleijn 
and van der Vaart [24]. 

The key quantity that governs the LD exponential decay of the posterior 
Iln{Q\Xi) in the i.i.d. case is the L-divergence of g S P with regard to p € 
L{q\\p) — — J plogg (cf. [15]). In the discrete case, L-divergence appears in 
Freedman's ([9], Theorem 1) as "entropy." If p is an empirical PMF, then L- 
divergence appears as Kerridge's inaccuracy ([21] and [25]) which is just the 
negative of the nonparametric likelihood. The L-projection q of p on Q (^V 
is q = arginfq^Q L{q\\p). The value of L-divergence, at an L-projection of p 
on Q, is denoted by L{Q\\p). 

Finally, let, for p,qeV,e>0, B,{q,p) ^{q'eV: L{q' \\p) - L{q\\p) < e}. 
For ACP, B,{A,p)^{qer:L{q\\p)-L{A\\p)<e}. 

Using this notation, the BST can be stated as follows: 

Theorem 2.1 (BST). Let Xf be i.i.d. r. Let Q and ^ be open in weak 
topology; Q C $ C P. Let L{Q\\r) < oo; for any e > 0, let U{BeiQ,r)) > 
and H(i?j(<I>, r)) > 0. Then, for n ^ oo, 

- log n„((? G Q|Xn = -{L(Q 1 1 r)-L($ 1 1 r)} a.s. r~. 

n 
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Proof. For 5 C = mfg^s Let for e > 0, 5^(5) = {q : ln{q) - 

ln{S) < e}. Then, J^e-^"^^'^U{dq), A = {Q,<I>}, can be bounded as 

g-i„(A)-£jj(R"(^) n^) < / e-'"(«)n(dg) < e-'"(^). 

Ja 

By lower semicontinuity of L-divergence in the weak topology and a strong 
law of large numbers [which can be applied, since L{Q\\r) < oo, by assump- 
tion], ^ln{A) ^ L{A\\r), a.s. r°°, as n— >oo. Thus, it holds: 

limsupilog / e'"(^)n(dg) < -L(A||r). 

So, limsupn^oo n„(Q|Xf ) < —{L{Q || r) — L($ || r)}. By the same argument 
(SLLN and continuity), for sufficiently large n, n(i?"(^)) > 0, since Il{Bi;(A, 
r)) > by assumption. As B^{A) n A / 0, thus lim^^oo ^ log Il{B^{A) n 
A)= 0. Hence, liminf„^oon„(Q|^r) > -{^(Qlk) -L($||r)}. □ 

The posterior probability n„,(Q|X") decays exponentially fast with the 
decay rate L(Q || r) — L($ || r). The BST implies the Bayesian law of large 
numbers (BLLN). 

Theorem 2.2 (BLLN). Let ^ (IV he open in weak topology. Let (1) for 
every q£^, Il{Bs{q,r)) > and (2) L{^\\r) < oo. Let U = Ufe^(9fc>^) be 
a union of weak e-halls W{qk.,£) centered at L-projections qt, k = 1, . . . , k, 
K < oo, of r on <I>. Then, 

limUjqeU\X?)=l a.s.r^. 

n~*oo 

Proof. Let Q C ^ he open sets in weak topology. First, let Q be any 
set such that oo > L{Q || r) > || r). Then, assumptions of the BST are 
satisfied, and the theorem implies that n„((5|Xf ) — > 0, a.s. r°°, as oo. 
Note that L(Q || r) = oo for such Q that the L-projection q'^ of r on Q has 
support that is smaller than the support of r. However, for such a q^ , the 
posterior probability would be zero. The posterior thus concentrates on L- 
projections of r on <I>, provided that their support is not smaller than that 
of r. This is guaranteed by the assumption || ?') < oo. □ 

The BLLN theorem is an extension of Schwartz' consistency theorem [36], 
to the case of a misspecified model. Assumptions 1 and 2 of the BLLN, called 
hereafter the Schwartz conditions, reduce in the well-specified case to the 
Kullback-Leibler support condition (cf. [36] and [13], Theorem 4.4.2). 

The next lemma points out that the Bayesian maximum a posteriori 
probability (MAP), which selects Qmap = Si^&supg^^Iln{q | AT"), satisfies the 
BLLN. 
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Lemma 2.1. Let ^ (^V be open and the Schwartz conditions be satis- 
fied. Then, as n —> oo, the set of MAP distributions M. = g{^MAP :^map = 
argsup^gij, n„(g' I X")} converges (a.s. r°°) to the set of L-projections of r 
on 



Proof. Thanks to the Strong LLN (SLLN), which can be apphed un- 
der the Schwartz condition 2, the conditions for the infimum of minus the 
logarithm of the posterior probabihty (positivity of which is guaranteed by 
the Schwartz condition 1) turn into those for L-projections. □ 

Directly from the strong LLN it follows that the maximum nonparametric 
likehhood (MNPL), that selects ^mnpl = arginf^^^ satisfies the BLLN. 

Lemma 2.2. Let ^ be open and Schwartz condition 2 be satisfied. 
Then, as n^oo, the set of MNPL distributions converges {a.s. r°°) to the 
set of L-projections of r on <I>. 

Selection of a posterior mean or a sampling distribution that minimizes, 
say, the Kullback-Leibler distance L(q\\r) = J qlogj. with regard to q, in a 
misspecified case, would in general violate the BLLN. 

The lemmas also mean that the MNPL and the MAP methods asymp- 
totically select the same sampling distribution(s). 

Next, we turn to the semiparametric setting. 

2.3. BLLN for the semiparametric $(0). Let X be a random variable 
with probability density function r{X;d) parametrized hy 9 ^ @ C M.^ . A 
Bayesian specifies a model <I>(0) and puts a positive prior 11 over $(0), 
which in turn induces a prior ll{0) over Q; see Florens and Rolin [8], where 
also several models are worked out using a Dirichlet process prior. If the 
requirements of the BLLN are satisfied, then the posterior n„(-|X") con- 
centrates on weak neighborhoods of L-projections g of r on $(0), 

q(x:9)=avg inf ini L(q(x:6)\\r). 

The most common form of <I>(0) is the one defined by estimating equa- 
tions (cf. Section 1). In this case, $(0) is also known as a linear family of 
distributions that we denote as C{u, 9). The L-projection of r on C{u, 9) can 
be found by means of the following Theorem 2.3. To state it, we introduce a 
A family of distributions and recall the concept of support of a convex set. 
Let yl be a family of probability density functions: yl(r, u, X, 9) = {p V : p = 
r[l - J2j=i ^jUj{-, ^)]"\ A G M-^}. The support 5(C) of a convex set C C P is 
just the support of the member of C for which S{-) contains the support of 
any other member of the set. 
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Theorem 2.3. Let ^ = C{u,9). Let r he such that S{r) = S{C). 
Then, the L-projection q of r on ^ is unique and belongs to the A{r,u, X,6) 
family; that is, C{u, 0) R A{r, u, A, 9) = {q}. 

Proof. In light of Theorem 9 of [6] it suffices to check that q = r[l — 
X]/=i Ajnj(-; ^)]~^, with A such that q G C{u,9), satisfies Js(r)''"i^ ~ ^)~^^ 
for all G <I>, which is indeed the case. □ 

The estimator 9 can, thanks to convex duality, be obtained as 
6 = arg inf sup L{q{x; r, u, X,9)\\r), 

where q{x;r,u, X,9) € A{r,u, X,9). Since r is in practice not known, Kita- 
mura and Stutzer [22] suggested that L{q{x; r, u, X,9)\ \ r) be replaced by its 
estimate L{q{x; n,u,X,9)) = — J27=i^og q{xi] ^,u, X, 9), where q{x; ^,u, X,9) 
belongs to A{iJ,,u, X,9) and is the uniform PMF on X". The resulting 
estimator 

(2.1) 6'el — arg inf sup L{q{x; fi,u, X, 9)) 

is just the empirical likelihood (EL) estimator ([32] and [31]), since (2.1) is 
a convex dual problem to the optimization problem (1.1), by means of which 
EL is usually defined. Analogously to Lemma 2.2, it can be shown that the 
EL estimator gEL(2;;^EL) asymptotically (a.s. r°°) turns into an L-projection 
of r on <I>(0). The same holds for the MAP estimator 

(2.2) ^MAp(a::;^MAp) = arg sup supn„(g(x; 6')|x"). 

Hence, the EL and the MAP estimators are consistent under misspecifica- 
tion. This provides a basis for the EL approach as well for the Bayesian 
MAP estimation. 

EL estimators which are based on other discrepancy measures are, in 
general, not consistent when the model is not correctly specified. Example 
2.1 illustrates the inconsistency of the posterior mean. 

Example 2.1. Let $ = <l>i U ^2, where <l>i = C{u,9) and 9 e Qi = {9 e 
R:9<9i}. Similarly, $2 = C{u, 9) and G 63 = G M : ^ > ^2}- Let u{x, 9) = 
3; — ^ G M. For a sampling distribution r, it is possible to find 92 > EX > 9i, 
such that L($i ||r) = L($2 H?")- Assume this to be the case. Then, under 
the Schwartz conditions, the posterior concentrates on the weak balls cen- 
tered at the L-projections of r on $1 and $2, rendering the posterior mean 
inconsistent. 
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In the univariate case, BST and BLLN (Theorems 2.1 and 2.2), X can be 
replaced by a multivariate random variable and the theorems remain valid. 
Consequently, the extension to constructed by multivariate EE, is also 
direct. As an example, consider the linear statistical model Y = a + f3X + e, 
with stochastic X. In EE this is usually approached through estimating 
equations ^{9) = {q{x,y-,e): j q{x, y; 6) [Y-{a + PX)\ = 0, / g(x, y ; 0)X - 
(a + fix)] = 0} and 6 = (a,/?) € = 0, which are based on the Gaussian 
model score equations. The multivariate BLLN shows that the posterior 
asymptotically concentrates on the L-projections of r on $ = |J@$(0), and 
EL and MAP comply with the BLLN. 

2.4. BLLN for Poly a sampling. In this section we prove the BST and the 
BLLN for a multi-color Polya urn — a simple sampling process where data 
are neither identically nor independently distributed. The theorems can also 
be directly used in a corresponding semiparametric $(0) setting. 

The probability of a sample being drawn from a multicolor Polya urn, 
with parameter c€ Z and initial configuration q{N) = (ai, . . . , am,)/-/V, is 
\ogIi{X^\q{N)-c)^YT=iT.Zt [log(a^+jc)-log(iV+jc)]; this is mean- 
ingful if — nc < min(ai, . . . ,am)- We embed the sampling scheme into a 
Bayesian nonparametric setting. To this end, let V{X) be set of all PMFs 
with the support X = {xi, . . . Let $ C ^^{X) and let $(A^) denote the 

intersection of $ with the set of all possible configurations of the iV-urn. 
Let ^{N) be the support of the prior distribution Ii[q{N)) of initial con- 
figurations q{N). Let r{N) be the true initial configuration, where r{N) is 
not necessarily in <I>(A^). As before, we are interested in the LD asymptotics 
of the posterior distribution I[n{q{N) |Xf;c). Asymptotic investigations of 
posterior consistency will be carried on under the following assumptions: 
(1) n and go to infinity in such a way that I3{n) = ^ /? G (0, 1) as 
n — > oo, (2) and r{N) converges in the total variation metric to r € V{X) 
as n — oo. Topological qualifiers are meant in the topology induced on the 
m-dimensional simplex by the usual topology on M™. 

The exponential decay of the posterior is governed by Polya L-divergence. 
For p, g S V{X), the Polya L-divergence L^((? \ \p) oi q with respect to p is 

m m 

m 1 1 = - iog(9. +p^pr)+j-j: ir log . 

By the continuity argument, L^i^{q \\p) = —J2iLiPi^ogqi — 1. The Polya L^- 
projection g of p on Q C V{X) is g = arginfggQ L^(g \ \p). The value of L-- 
divergence at an L^-projection of p on Q is denoted by 
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Theorem 2.4. Let Q be an open set. Let f3{n) ^ /? S (0,1) and 
r{N) r as oo. Let L'^(Q\\r) < oo. Then, for n oo, 

^ logn„(g(iV) G Q I X^;c) = -{LliQWr) - 1 1 r)} 

with probability one. 

Proof. The proof is constructed separately for c > 0, c < and c = 0. 

For c / A 7?c ^ (Z")*^ A t/ ^ Z~, logn(Xp | q{N);c) can equivalently be 
expressed as log{r{r])/T{r] + n))+Y^^ilog{T{r]qi+ni)/T{r]qi)), where r(-) is 
the Gamma function and 7] = N/c. For < a <b, the ratio r(6)/r(a) can be 
upper-bounded by b^~^^'^ /a"-"^/'^e^~"- and lower-bounded by 6^~^/a"~^e*~" 
(cf. [20]). Then, n„(g(A^) € Q|x";c) can be upper-bounded by Un (depen- 
dence of q on N is made implicit), 

_ E,gQn(g)re^ie-"'(''"^/(^")^ 

" E,6$n(Q)n^ie-'fe.Vn) ' 

lower-bounded by L„ in similar way; to get L„ just replace l/2n with 1/n in 
Un- There, /(g„ a) 4 _[(^. _ «) log7, + (7, + z/f - a) log(7i + z^«")], 7i = 

a G {^) 2nJ' ^'^'^ empirical measure induced by the sample X". 

Next, we use simple bounds to upper bound f/„ by C7„ 

i-rm p-nife(Q,l/(2n)),l/(2n)) 

^(g($, 1 /n)) YlZi e-"'(5«(*'i/n),i/n) ' 

and to lower bound L.„ by to get L^ just replace l/2n with 1/n in C7„. 
There, g(-,a) = arginfgg. E^ILlK5^;«)• 

By the Strong Law of Large Numbers for Polya Sampling, z^" r, al- 
most surely, as n — > 00. The Polya L-divergence is continuous in q and Q 
is open, by assumption. Thus, ^logUn converges, with probability one, to 
—{L'^(Q II r) — L^($ II r)}, as n — > 00. This is the same as the "point" of al- 
most sure convergence of - log L„ and the theorem for c > is thus proven. 

For c/OA(l-r/g) ^ {Z' A{1- rj) ,logU{X^ \q{N);c) can equiva- 
lently be expressed as log(r(l - r/ - n)/T(l - r?)) Yl'^Li log(r(l - rjqi) /r(l - 
rjqi — Ui)). The proof then can be constructed along the same lines as for 
c > 0. The case of c = is straightforward. □ 

From the Polya EST (Theorem 2.5), the BLLN for Polya sampling di- 
rectly follows. It is worth noting that the MNPL in Polya sampling can be 
constructed in two ways: either via maximization of n(X" \q(N);c), or by 
maximization of the negative of L'^{q\\i'^) with regard to q, where v"" is em- 
pirical PMF induced by sample . The methods could be called "exact" 
and "asymptotic" MNPL, respectively. Both the methods comply with the 
Polya BLLN, as does the Bayesian MAP. 
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2.5. BST for right-censored data. Right-censoring of a r.v. X by a r.v. 

Y [both on (M, B)] can be described by the following hierarchical model: 6 ~ 
Ber(a), a = J Fo[y) dGo{y); if 5 = 0, then X ~ Fq; if 5 = 1, then X = {Y, oo) 
where Y ~ Gq; X's are conditionally independent. A Bayesian puts positive 
prior over the set $ of distributions of X. Let the prior over distributions of 

Y be concentrated at Gq. We are interested in the exponential decay of the 
posterior 

/Qe-'"(^'"i)n(dF) 

where ^ -E.:5.=ologi^({^*}) - E^:^,=llogi^((l^^,oo)), Q C Fq 

is not necessarily in <I>, and ni is the number of noncensored data, out of n 
observations. The decay is governed by the L-divergence of F with regard 
to (Fo,Go) for right-censoring 

L(F||(Fo,Go)) 

= - logF{x)dFo{x) + {l-a) J logF((y,oo))dGo(y) . 

The L-projection F of (Fq, Gq) on Q C p is F = arginfi^^gg L(F || (Fq, Gq)), 
and L{Q \ \ (Fq, Gq)) denotes the value of the F-divergence at an L-projection 
of Fo on Q. Let B,iQ, Fo) = {FeV: L{F \ \ {Fq,Gq)) -L{Q\\ (Fq, Gq)) < e}. 
The BST for right-censoring follows. 



Theorem 2.5. Let Xi be right- censored data generated by the above 
model. Let Q, $ be open in weak topology; Q C ^ QV . Let L{Q \ \ (Fq, Gq)) < 
oo, and for any e>0, let U{Bs{Q, Fq)) > and Il{Bs{^, Fq)) > 0. Then for 
n oo, 

- logn,,(F e Q I X^) = -{L{Q\\ (Fo, Go)) -L{^\\ (Fq, Go))}, 
n 

Proof. Note that ^/„(F, ni) converges to F(F|| (Fo,Go)), with proba- 
bility 1, by the SLLN. Arguments go along the lines of the proof of Theorem 
2.1. □ 



From the BST (Theorem 2.5), the BLLN follows for right-censored data in 
the same way as it does for the i.i.d. case from Theorem 2.1. The BLLN for 
right-censoring demonstrates that the posterior concentrates on weak neigh- 
borhoods of the L-projections of (Fo, Go) on if the e-balls Bg{F, (Fq, Gq)) = 
{F' gP:L(F' II (Fo, Go)) - L(F || (Fo,Go)) < e}, have positive prior proba- 
bility. This, together with assumption L($ 1 1 (Fo, Gq)), forms the Schwartz 
conditions for right censoring. 
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Under the Schwartz conditions, a set of Bayesian MAP estimators i^MAP — 
argsupjT'g^ n„(F I Xp) asymptotically coincides with a set of L-projections 
of (Fo,Go) on The same holds true for the MNPL/EL estimator Fel = 
argini p^^ln{F,ni). The Kaplan-Meier estimator follows from in the 
standard way (cf. [31]). Thus, the BLLN makes it possible to view the 
Kaplan-Meier estimator as an asymptotic instance of the Bayesian MAP, 
and provides a probabilistic underpinning. The only available Bayesian view 
of the Kaplan-Meier estimator seems to be that of Susarla and van Ryzin 
[37]. In [37], a Dirichlet process prior was considered in a well-specified 
model, and it was shown there that the posterior mean converges to the 
Kaplan-Meier estimator as the parameter a of the Dirichlet process con- 
verges to 0. 
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